The development and popularity of voice-user interfaces made spontaneous speech processing an important research field. One\nof the main focus areas in this field is automatic speech recognition (ASR) that enables the recognition and translation of spoken\nlanguage into text by computers. However, ASR systems often work less efficiently for spontaneous than for read speech, since the\nformer differs from any other type of speech in many ways. And the presence of speech disfluencies is its prominent characteristic.\nThese phenomena are an important feature in human-human communication and at the same time they are a challenging obstacle\nfor the speech processing tasks. In this paper we address an issue of voiced hesitations (filled pauses and sound lengthenings)\ndetection in Russian spontaneous speech by utilizing different machine learning techniques, from grid search and gradient descent\nin rule-based approaches to such data-driven ones as ELM and SVM based on the automatically extracted acoustic features.\nExperimental results on the mixed and quality diverse corpus of spontaneous Russian speech indicate the efficiency of the\ntechniques for the task in question, with SVM outperforming other methods.
Loading....